Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(slo): new slo architecture #172224

Merged
merged 52 commits into from
Dec 12, 2023
Merged

Conversation

kdelemme
Copy link
Contributor

@kdelemme kdelemme commented Nov 29, 2023

🍒 Summary

This PR introduces a breaking change in the SLO feature. We are moving away from global summary transforms (there were 10 of them), and instead install the following resources per SLO definition:

  • rollup transform, e.g. slo-{id}-{revision}
  • summary transform, e.g. slo-summary-{id}-{revision}
  • summary ingest pipeline, e.g. .slo-observability.summary.pipeline-{id}-{revision}

The rollup transform is setup with a common rollup ingest pipeline .slo-observability.sli.pipeline-v${SLO_RESOURCES_VERSION} which sets event.ingested timestamp as well as splitting the destination index per month (as we were already doing before). This new event.ingested field in the rollup data is used to sync the summary transforms with. This should prevent any issue with delays coming from the rollup transform, e.g. slow queries from ccs.

The summary transform uses simpler queries and less groupBy fields which should make the transform runs faster.
The summary ingest pipeline sets the SLO definition metadata when the transform update the summary document.

Summary documents are becoming space aware: this fixes a bug when SLOs are created in different spaces, but returned from the summary search client before being filtered out, resulting in erroneous pagination.

🧪 Testing

Upgrading kibana from main to this branch
When starting kibana on the new branch after having started kibana on main, the new resources (indices, pipeline) should be created successfully.
Any existing SLO should have their SO migrated (version added in the saved object definition).
Any existing SLO should be reset-able through the API: POST /api/observability/slos/{id}/_reset and should appear on the SLO listing page again.

  • Tested on stateful
  • Tested on stateless

Restarting Kibana
When restarting kibana, the resources installation should not fail.

  • Tested on stateful
  • Tested on stateless

Updating an SLO
If the change is made on a field requiring a revision bump, e.g. indicator, timeWindow, budgetingMethod, settings, objective, the revision will be bump, and a two new transforms (rollup and summary) for that new revision should be created, while the two previous transforms should be deleted.

  • Tested on stateful
  • Tested on stateless

SLO space aware
Two SLOs created in different spaces should not be shown in the other space, the total in the paginated response should be 1 (for one SLO in that space)

Creating a serverless deployment

Set your API_KEY and ENV_URL, then create a new deployment

curl "${ENV_URL}/api/v1/serverless/projects/observability" \
       -H "Authorization: ApiKey $API_KEY" \
       -H "Content-Type: application/json" \
       -XPOST -d '{
          "name": "test-slo",
          "region_id": "aws-eu-west-1"
       }'

Update an existing deployment (find the deployment_id either in the previous reponse, or in the cloud console) with this PR image:

curl "${ENV_URL}/api/v1/serverless/projects/observability/DEPLOYMENT_ID" \
       -H "Authorization: ApiKey $API_KEY" \
       -H "Content-Type: application/json" \
       -XPUT -d '{
          "name": "test-slo",
          "overrides": {
            "kibana": {
                "docker_image": "docker.elastic.co/kibana-ci/kibana-serverless:pr-172224-58a55cc1afee"
            }
          }
       }'

or you can create a serverless deployment directly with that PR image:

curl "${ENV_URL}/api/v1/serverless/projects/observability" \
       -H "Authorization: ApiKey $API_KEY" \
       -H "Content-Type: application/json" \
       -XPOST -d '{
          "name": "test-slo-2",
          "region_id": "aws-eu-west-1",
 "overrides": {
            "kibana": {
                "docker_image": "docker.elastic.co/kibana-ci/kibana-serverless:pr-172224-58a55cc1afee"
            }
          }
       }'

Run the High cardinality indexer:

DATASET="fake_stack" EVENTS_PER_CYCLE=300 EVENT_TEMPLATE=good INDEX_INTERVAL=60000 LOOKBACK=now-6h/h ELASTICSEARCH_HOSTS="URL TO ELASTICSEARCH" ELASTICSEARCH_API_KEY="API_KEY CREATED IN KIBANA" SERVERLESS=1 yarn start

Release note

We introduce a breaking change in the SLO features that will break any SLOs created before 8.12. These SLOs will have to be manually reseted through an API until we provide a UI for it. The data aggregated over time (rollup) will still be available in the sli v2 index, but won't be used for summary calculation when reset.

The previous summary transforms summarizing every SLOs won't be used anymore and can be stopped and deleted:

  • slo-summary-occurrences-7d-rolling
  • slo-summary-occurrences-30d-rolling
  • slo-summary-occurrences-90d-rolling
  • slo-summary-occurrences-monthly-aligned
  • slo-summary-occurrences-weekly-aligned
  • slo-summary-timeslices-7d-rolling
  • slo-summary-timeslices-30d-rolling
  • slo-summary-timeslices-90d-rolling
  • slo-summary-timeslices-monthly-aligned
  • slo-summary-timeslices-weekly-aligned

Be aware that when installing a new SLO (or after resetting an SLO), we install two transforms (one for the rollup data and one that summarize the rollup data). Do not delete the new slo-summary-{slo_id}-{slo_revision} transforms.

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@kdelemme kdelemme force-pushed the slo-new-architecture branch 2 times, most recently from 037095c to 51caec2 Compare November 30, 2023 21:18
@kdelemme kdelemme force-pushed the slo-new-architecture branch from 1801e01 to f3a8c11 Compare December 1, 2023 17:23
simianhacker added a commit that referenced this pull request Dec 1, 2023
## Summary

This PR fixes #172372 by adding the `date_formats` attribute to the
`date_index_name` pipeline step for the SLI ingest pipeline that every
SLO runs through. This PR is only for 8.11, the fix for main will be
included with: #172224
@shahzad31
Copy link
Contributor

When i view details page, status says it's violated but in list , it says no data, i did a reset for this SLO via API

Violdated in details page

image

No DATA in list view
image

@mgiota
Copy link
Contributor

mgiota commented Dec 11, 2023

@kdelemme Thanks for clear explanation regarding cloning the SLO definition and not the SLO instance.

And you are right, with the new flow for cloning an SLO this will become less problematic, user can change before saving the cloned SLO.

Copy link
Contributor

@mgiota mgiota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, didn't manage to break it

Comment on lines -538 to -549
"slo.name": Object {
"script": Object {
"source": "emit('irrelevant')",
},
"type": "keyword",
},
"slo.objective.target": Object {
"script": Object {
"source": "emit(0.999)",
},
"type": "double",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of remaining runtime_mappings in the source index?
i am still seeing slo.id slo.revision and slo.instanceId in runtime_mappings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are injecting this field through runtime_mapping, so we can groupBy on them in the aggregation. The source data does not have the slo.id, revision etc, so we have to inject them in order to group on them :) We asked the transform team to provide a more efficient way of doing this, e.g. "static fields"

Comment on lines +28 to +50
query: {
bool: {
filter: [
{
range: {
'@timestamp': {
gte: `now-${slo.timeWindow.duration.format()}/m`,
lte: 'now/m',
},
},
},
{
term: {
'slo.id': slo.id,
},
},
{
term: {
'slo.revision': slo.revision,
},
},
],
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if it makes sense to specify sorting on search here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you sort on? And I think the transform takes care of that with the date histogram and other group by?

Copy link
Contributor

@shahzad31 shahzad31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !!

@kibana-ci
Copy link
Collaborator

kibana-ci commented Dec 11, 2023

💚 Build Succeeded

  • Buildkite Build
  • Commit: 193e0f6
  • Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-172224-193e0f6c08e2

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
observability 546 567 +21

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/slo-schema 128 144 +16

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
observability 1.1MB 1.1MB -14.1KB

Canvas Sharable Runtime

The Canvas "shareable runtime" is an bundle produced to enable running Canvas workpads outside of Kibana. This bundle is included in third-party webpages that embed canvas and therefor should be as slim as possible.

id before after diff
module count - 5713 +5713
total size - 5.9MB +5.9MB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
observability 102.6KB 102.6KB +28.0B

Saved Objects .kibana field count

Every field in each saved object type adds overhead to Elasticsearch. Kibana needs to keep the total field count below Elasticsearch's default limit of 1000 fields. Only specify field mappings for the fields you wish to search on or query. See https://www.elastic.co/guide/en/kibana/master/saved-objects-service.html#_mappings

id before after diff
slo 10 11 +1
Unknown metric groups

API count

id before after diff
@kbn/slo-schema 131 144 +13

async chunk count

id before after diff
observability 21 22 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@simianhacker simianhacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kdelemme kdelemme added v8.13.0 backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) labels Dec 12, 2023
@kdelemme kdelemme merged commit b51304f into elastic:main Dec 12, 2023
41 checks passed
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Dec 12, 2023
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.12

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Dec 12, 2023
# Backport

This will backport the following commits from `main` to `8.12`:
- [feat(slo): new slo architecture
(#172224)](#172224)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Kevin
Delemme","email":"kevin.delemme@elastic.co"},"sourceCommit":{"committedDate":"2023-12-12T13:45:12Z","message":"feat(slo):
new slo architecture
(#172224)","sha":"b51304f3f3c3e8510c44a235d0fc65c44fcce225","branchLabelMapping":{"^v8.13.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:breaking","backport:prev-minor","ci:build-serverless-image","Feature:SLO","v8.12.0","Team:obs-ux-management","v8.13.0"],"number":172224,"url":"https://github.com/elastic/kibana/pull/172224","mergeCommit":{"message":"feat(slo):
new slo architecture
(#172224)","sha":"b51304f3f3c3e8510c44a235d0fc65c44fcce225"}},"sourceBranch":"main","suggestedTargetBranches":["8.12"],"targetPullRequestStates":[{"branch":"8.12","label":"v8.12.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.13.0","labelRegex":"^v8.13.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/172224","number":172224,"mergeCommit":{"message":"feat(slo):
new slo architecture
(#172224)","sha":"b51304f3f3c3e8510c44a235d0fc65c44fcce225"}}]}]
BACKPORT-->

Co-authored-by: Kevin Delemme <kevin.delemme@elastic.co>
simianhacker added a commit that referenced this pull request Dec 12, 2023
## Summary

This PR is a follow up to #172224, it adds a UI for resetting the SLO
definitions from the previous model. Once #179473 is merged I will
rebase this against `main` and convert it from a "draft" PR to "ready to
review".


![image](https://github.com/elastic/kibana/assets/41702/daf0591c-272f-40c2-9831-658d7b9b1378)


![image](https://github.com/elastic/kibana/assets/41702/d385396d-d840-4574-942a-b8e51ce66066)


![image](https://github.com/elastic/kibana/assets/41702/729df2a0-61e6-41b3-aaa5-8501e7aa7797)


### Testing

1. Start by loading `main`
2. Ingest some data
3. Create some SLOs
4. Change Kibana from `main` to this PR
5. Visit the SLO page, you should see a banner similar to the screen
shot above.
6. Do your best to break this

---------

Co-authored-by: shahzad31 <shahzad31comp@gmail.com>
Co-authored-by: Dominique Clarke <doclarke71@gmail.com>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Dec 12, 2023
## Summary

This PR is a follow up to elastic#172224, it adds a UI for resetting the SLO
definitions from the previous model. Once elastic#179473 is merged I will
rebase this against `main` and convert it from a "draft" PR to "ready to
review".

![image](https://github.com/elastic/kibana/assets/41702/daf0591c-272f-40c2-9831-658d7b9b1378)

![image](https://github.com/elastic/kibana/assets/41702/d385396d-d840-4574-942a-b8e51ce66066)

![image](https://github.com/elastic/kibana/assets/41702/729df2a0-61e6-41b3-aaa5-8501e7aa7797)

### Testing

1. Start by loading `main`
2. Ingest some data
3. Create some SLOs
4. Change Kibana from `main` to this PR
5. Visit the SLO page, you should see a banner similar to the screen
shot above.
6. Do your best to break this

---------

Co-authored-by: shahzad31 <shahzad31comp@gmail.com>
Co-authored-by: Dominique Clarke <doclarke71@gmail.com>
(cherry picked from commit c2003d9)
kibanamachine added a commit that referenced this pull request Dec 13, 2023
# Backport

This will backport the following commits from `main` to `8.12`:
- [[SLO] Reset UI for updating outdated SLOs
(#172883)](#172883)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Chris
Cowan","email":"chris@elastic.co"},"sourceCommit":{"committedDate":"2023-12-12T19:36:20Z","message":"[SLO]
Reset UI for updating outdated SLOs (#172883)\n\n## Summary\r\n\r\nThis
PR is a follow up to #172224, it adds a UI for resetting the
SLO\r\ndefinitions from the previous model. Once #179473 is merged I
will\r\nrebase this against `main` and convert it from a \"draft\" PR to
\"ready
to\r\nreview\".\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/41702/daf0591c-272f-40c2-9831-658d7b9b1378)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/41702/d385396d-d840-4574-942a-b8e51ce66066)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/41702/729df2a0-61e6-41b3-aaa5-8501e7aa7797)\r\n\r\n\r\n###
Testing\r\n\r\n1. Start by loading `main`\r\n2. Ingest some data\r\n3.
Create some SLOs\r\n4. Change Kibana from `main` to this PR\r\n5. Visit
the SLO page, you should see a banner similar to the screen\r\nshot
above.\r\n6. Do your best to break
this\r\n\r\n---------\r\n\r\nCo-authored-by: shahzad31
<shahzad31comp@gmail.com>\r\nCo-authored-by: Dominique Clarke
<doclarke71@gmail.com>","sha":"c2003d9f83f6d437ec9ce46943a402b38c07ece5","branchLabelMapping":{"^v8.13.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","backport:prev-minor","Feature:SLO","v8.12.0","Team:obs-ux-management","v8.13.0"],"number":172883,"url":"https://github.com/elastic/kibana/pull/172883","mergeCommit":{"message":"[SLO]
Reset UI for updating outdated SLOs (#172883)\n\n## Summary\r\n\r\nThis
PR is a follow up to #172224, it adds a UI for resetting the
SLO\r\ndefinitions from the previous model. Once #179473 is merged I
will\r\nrebase this against `main` and convert it from a \"draft\" PR to
\"ready
to\r\nreview\".\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/41702/daf0591c-272f-40c2-9831-658d7b9b1378)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/41702/d385396d-d840-4574-942a-b8e51ce66066)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/41702/729df2a0-61e6-41b3-aaa5-8501e7aa7797)\r\n\r\n\r\n###
Testing\r\n\r\n1. Start by loading `main`\r\n2. Ingest some data\r\n3.
Create some SLOs\r\n4. Change Kibana from `main` to this PR\r\n5. Visit
the SLO page, you should see a banner similar to the screen\r\nshot
above.\r\n6. Do your best to break
this\r\n\r\n---------\r\n\r\nCo-authored-by: shahzad31
<shahzad31comp@gmail.com>\r\nCo-authored-by: Dominique Clarke
<doclarke71@gmail.com>","sha":"c2003d9f83f6d437ec9ce46943a402b38c07ece5"}},"sourceBranch":"main","suggestedTargetBranches":["8.12"],"targetPullRequestStates":[{"branch":"8.12","label":"v8.12.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.13.0","labelRegex":"^v8.13.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/172883","number":172883,"mergeCommit":{"message":"[SLO]
Reset UI for updating outdated SLOs (#172883)\n\n## Summary\r\n\r\nThis
PR is a follow up to #172224, it adds a UI for resetting the
SLO\r\ndefinitions from the previous model. Once #179473 is merged I
will\r\nrebase this against `main` and convert it from a \"draft\" PR to
\"ready
to\r\nreview\".\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/41702/daf0591c-272f-40c2-9831-658d7b9b1378)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/41702/d385396d-d840-4574-942a-b8e51ce66066)\r\n\r\n\r\n![image](https://github.com/elastic/kibana/assets/41702/729df2a0-61e6-41b3-aaa5-8501e7aa7797)\r\n\r\n\r\n###
Testing\r\n\r\n1. Start by loading `main`\r\n2. Ingest some data\r\n3.
Create some SLOs\r\n4. Change Kibana from `main` to this PR\r\n5. Visit
the SLO page, you should see a banner similar to the screen\r\nshot
above.\r\n6. Do your best to break
this\r\n\r\n---------\r\n\r\nCo-authored-by: shahzad31
<shahzad31comp@gmail.com>\r\nCo-authored-by: Dominique Clarke
<doclarke71@gmail.com>","sha":"c2003d9f83f6d437ec9ce46943a402b38c07ece5"}}]}]
BACKPORT-->

Co-authored-by: Chris Cowan <chris@elastic.co>
Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) ci:build-serverless-image Feature:SLO release_note:breaking Team:obs-ux-management Observability Management User Experience Team v8.12.0 v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.